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In this paper, the performance limits, as given by the signal-to-noise 
ratio (s/n), are described for different speech-encoding schemes including 
adaptive quantization and (linear) adaptive prediction schemes. The 
comparison is made on the basis of computer simulations using 8-kHz- 
sampled speech signals of one speaker. Different bit rates (two bits per 
sample — five bits per sample) have been used. 

A three-bit-per-sample PCM scheme with a nonadaptive filOO quantizer 
leads to an s/n value of approximately 9 dB. A maximum s/n value of 
approximately 25 dB has been reached using an encoding scheme in- 
cluding both adaptive quantization and adaptive prediction. Entropy 
coding of the quantizer output symbols leads to an additional gain in 
s/n of nearly 3 dB. 

I. INTRODUCTION 

Design of an efficient encoding scheme requires some knowledge of 
the statistics of the signal. Efforts to improve the performance of pcm 
systems have taken two primary directions : 

(i) Use of quantizing schemes based on knowledge of the (one- 
dimensional) probability density function (pdf) of the samples 
to be quantized. 
(ii) Use of quantizing schemes exploiting the correlation between 
successive samples. 

If we had an a priori knowledge of the statistics of the samples, a 
nearly optimum quantization scheme could be used consisting of : 

(i) A quantizer matched to the pdf of the signal to be quantized. 
(ii) A predictor optimized for the given autocorrelation function of 
the signal. 

The predictor lowers the variance of the signal to be quantized by 
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removing the correlation between successive samples. This is done by 
subtracting an estimation value from each incoming sample ; the differ- 
ence can be quantized, encoded, and transmitted (differential pcm 
= dpcm). 

In digital speech-encoding systems, we have only a small amount 
of a priori knowledge of the statistics which, in addition, usually 
change with time : 

(i) The long-period mean level differs from speaker to speaker. 
(ii) At a given mean level, the instantaneous level changes be- 
cause of variations in speech sounds. 
(Hi) The correlations between successive samples change because of 
variations in speech sounds. 

To overcome these problems of unknown statistics, adaptive quan- 
tization and adaptive prediction schemes must be used. In these 
schemes, local estimates of the statistical parameters are calculated. 
The quantizer and/or predictor are then optimized based on these 
estimates. 1-3 

This paper compares different encoding schemes that include : 

(i) Fixed quantizers. 
(ii) Adaptive quantizers. 
(Hi) Fixed predictors. 
(iv) Adaptive predictors. 

The comparison is done on the basis of computer simulations; the 
signal-to-quantization noise ratio (s/n) has been used as the criterion 
for the comparisons. It is believed, however, that the s/n understates 
the subjectively perceived performance of encoders that have differ- 
ential quantizers (the dpcm schemes in this paper). 1 

II. DESCRIPTION OF THE ENCODING SCHEMES 

A computer program has been written that allows the simulation of 
encoding schemes combining the possibilities of nonadaptive or 
adaptive quantization and nonadaptive or adaptive prediction. The 
schemes that have been used are described in the following sections. 

2.1 Fixed and adaptive quantizers 

If the quantizer is nonadaptive, its characteristic is assumed to be 
logarithmic. Optimum, i.e., s/n-maximizing quantizers (whether uni- 
form or nonuniform), cannot be used, not even under the assumption 
of a constant mean level, because the idle channel noise is higher for op- 
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timum quantizers than for logarithmic quantizers and results in poorer 
subjective performance. 3 - 4 The idle channel noise performance is de- 
termined by the smallest reconstruction level T\ of the quantizer. 
Table I lists these values for various optimum three-bit quantizers (the 
term Gauss quantizer refers to a quantizer with an s/n-maximizing 
performance for signals with a gaussian pdf, etc.). The gamma pdf 
is a good model for speech amplitudes, but the smallest reconstruction 
level is 2.4 times higher for the corresponding optimum quantizer than 
for the logarithmic quantizer. 

To overcome the problems of unknown mean level and the variations 
of the instantaneous level, adaptive quantization schemes (aq schemes) 
can be used. A local estimate <j\ of the variance of the input signal can 
be calculated; this value controls the gain of an amplifier located in 
front of a quantizer that is optimum for signals with unit variance. Two 
schemes are possible : 

(i) Forward estimation (aqf): The estimation value is calculated 
from samples of the input signal. The input signal must be 
buffered, and the estimation value must be transmitted to the 
receiver in addition to the quantized samples. 3 - 5 
(u) Backward estimation (aqb) : The estimation value is calculated 
from quantized samples 3-6 ; therefore, the state of the amplifier 
need not be transmitted (except for synchronizing purposes in 
case of channel errors). 

Figure 1 shows the structures of the different pcm schemes. Note that 
the combination of controlled amplifier and fixed quantizer can be re- 
placed by a quantizer with a step-size adaptation. Matching the gain 
of the amplifier to signal variance results in modifying the pdf of the 
signal to be quantized. It has been shown that different density func- 
tions can be reached by choosing an appropriate forward estimation 
scheme. 3 To get the best s/n performance, those quantizers can be 
employed that are optimum for the specific pdf. 

Table I — Comparison of the smallest reconstruction levels r x 
of different optimum unit variance three-bit quantizers 





Nonuniform 


Uniform 


Type of Quantizer 


Quantizer 


Quantizer 




T\ 


n 


Uniform pdf quantizer 


— 


0.217 


Gauss quantizer 


0.245 


0.293 


Laplace quantizer 


0.222 


0.366 


Gamma quantizer 


0.149 


0.398 


Logarithmic mIOO quantizer 


0.062 


— 
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2.2 Types of quantizers 

The following types of quantizers were used in the simulation of the 
speech-encoding systems : 

Uniform quantizer with different loading factors. 
Logarithmic quantizer with different loading factors. 

Uniform optimum Gauss quantizer. 
Uniform optimum Laplace quantizer. 
Uniform optimum gamma quantizer. 

Nonuniform optimum Gauss quantizer. 
Nonuniform optimum Laplace quantizer. 
Nonuniform optimum gamma quantizer. 

These optimum quantizers lead to a maximum s/n for the specific 
probability density functions. 

2.3 Algorithms ot the AQ schemes 

In applying adaptive quantization schemes, different possibilities 
of controlling the gain of the amplifier have been used. The following 
notation has been employed for the description of the algorithms (see 
also Figs. 1 and 2) : 

Symbol Explanation 

x (n) Input sample at time instant n. 

y(ri) Quantized sample at time instant n. 

I„ Index of quantizer step at time instant n. 

G» Gain of the amplifier at time instant n (backward 

estimation). 

Gs Gain of the amplifier used in block N (forward estimation). 

M Number of quantized samples used for calculation of (?„ 

(backward estimation). 

N Number of block. 

NSEG Number of input samples used for calculation of Gn (for- 
ward estimation). 

NF Number of first sample of block N. 

NF = (N - 1)-NSEG + 1. 

Pat Vector of short-term autocorrelation coefficients calculated 

from the input samples of block N. 

R N Toeplitz matrix of short-term autocorrelation coefficients 

calculated from the input samples. 

a, p, olj, e Coefficients to be optimized for each algorithm. 

In all aq schemes, a local estimate of the quantizer input signal variance 
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(c) 

Fig. 1— pcm encoding schemes, (a) nonadaptive pcm. (b) Adaptive pcm with for- 
ward estimation (pcm-aqf). (c) Adaptive pcm with backward estimation (pcm-aqb). 
x(n) = input signal, y(n) = quantized signal, q = quantizer, qc = gain control, 
b -J- ge = buffer and gain estimation. 



is calculated ; this value determines the gain of the amplifier such that 
the quantizer is optimal loaded. 

2.3.7 Forward estimation schemes (AQF) 

In the aqf schemes, the gain is only readjusted once for a new block 
of NSEG speech samples : 

G N = const. ; n = NF, NF + 1, • ■ • , NF + NSEG - 1 
NF = (N - 1)-NSEG + 1. 




G n . Gn 



Fig. 2— aq scheme. x(n) = input signal, y(n) = quantized signal, G„ = gain used 
at time instant 7i(aqb scheme), Gn = gain used in block iV (aqf scheme). 
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The following algorithms have been used : 

(t) pcm : variance scheme 3 : An unbiased estimation of the variance 
of the block is calculated : 

1 NSEO 

°*'-°-Mm&w- l + i) - (1) 

Gn is proportional to the inverse of the standard deviation 
estimated from the samples of the block. 
(it) pcm : maximum scheme : The maximum amplitude in the block 
is used : 

Gn 1 - a-max { \x(NF - 1 + j)\ }j=i,-,nseg. (2) 

(Hi) dpcm: maximum scheme 1 : The maximum difference between 
neighbored samples is used : 

Gn 1 = a-max 

{\x(NF - 1 + j) - x(NF - 2 + j) | } j= 2,....nseg. (3) 

This algorithm can only be used for predictors with one 
coefficient. 
(iv) adpcm: variance scheme*-* ': The vector of short-term autocor- 
relation coefficients is used to calculate an estimation value of 
the variance a\ of the difference signal : 

QZ* = a-el = a -l* 2 x - qZ-R^-Qn! (4) 

2.3.2 Backward estimation schemes (AQB) 

In aqb schemes, the gain of the amplifier is, in general, modified for 
every new input sample by a factor depending on the knowledge of the 
previous quantized samples or of the corresponding quantizer indices. 

G n =a-G„_i. 

The following algorithms have been used : 

(i) One-word memory scheme 6 : The last gain value is multiplied by 
a factor that depends on the last occupied quantizer step : 

G n = f(\I n \)G n -i- (5) 

(ii) Variance scheme*- 10 : The last M quantized samples and (for 
/3 t± 0) the last gain value are used to calculate a new gain 
value : 



M 



Gn 2 = E a/?{n - j) + P/GI-l (6) 



;'=i 
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(Hi) Modified one-word memory scheme 3 : The gain of the amplifier is 
changed if the smallest reconstruction level has been occupied 
a times or if the largest reconstruction level has been occupied 
once: 



On = 



2.0 • G n -i if | I m | = min for a times (m = n, 

n — 1, • • -, n - a + 1) , ? . 
0.5 •<?„_! if |/„| = max Ki) 

l.O-Gn-i otherwise. 



2.4 Fixed and adaptive predictors 

In predictive encoding systems, an estimate of each input sample is 
calculated and subtracted from the actual input sample ; the difference 
is then quantized, encoded, and transmitted. The use of nonadaptive 
predictors (dpcm schemes) leads to a suboptimum overall performance 
of the encoding scheme, because the prediction is not optimum for all 
speakers and for all speech sounds. 1 - 8 

A better prediction can be reached by using adaptive algorithms 
(adpcm schemes) . Two schemes are possible : 

(i) Forward scheme: A short-time autocorrelation function is cal- 
culated using a finite number of buffered input samples. The 
predictor coefficients are readjusted according to the time- 
variant autocorrelation function. 2 - 11 
(it) Backward scheme : The predictor is optimized using the quantized 
information (gradient search method and Kalman filter 
algorithm). 1 - 2 

Only the forward scheme has been used in the simulations. The 
optimum vector h N of J predictor coefficients for each block N is 

h„ = R^ QN . (8) 

Rjv and (t N are the matrix and the vector of short-term autocorrelation 
coefficients calculated from the input samples of block N. The predictor 
coefficients have to be transmitted to the receiver, in addition to the 
code words of the quantized difference signal samples. An upper bound 
of the gain in s/n as compared to pcm is given in Section IV. 

III. RESULTS 

Various nonadaptive and adaptive encoding schemes have been 
simulated on a digital computer. The signal-to-quantization noise 
ratio (s/n) has been determined using 8-kHz-sampled speech samples 
of one speaker. The same 2.3-s utterance ("The boy was mute about 
his task" ; female voice ; bandwidth 200 to 3200 Hz) has been used in 
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Fig. 3 — dpcm scheme. Q = quantizer. 

all simulations. (The simulations have not included any high-frequency 
emphasis of the input speech, as is characteristic of a 500-type set 
transmitter, for example.) The following schemes have been studied : 

(i) Nonadaptive Quantization 
pcm: see Fig. 1. 

dpcm (nonadaptive prediction) : see Fig. 3. 
adpcm (adaptive prediction) : see Fig. 4. 
(ii) Adaptive Quantization 
pcm: see Fig. 1. 

Forward scheme (pcm-aqf). 
Backward scheme (pcm-aqb). 
dpcm (nonadaptive prediction) : see Fig. 5. 
Forward scheme (dpcm-aqf). 
Backward scheme (dpcm-aqb). 
adpcm (adaptive prediction) : see Fig. 6. 
Forward scheme (adpcm-aqf). 
Backward scheme (adpcm-aqb). 

These encoding schemes have been optimized using the s/n as criterion. 



INPUT 

O— 



e 



PREDICTOR 

— zs 



OUTPUT 

— — O 



Fig. 4 — adpcm scheme, b 4- ce = buffer and coefficients estimator. 
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DPCM-AQF 
DPCM-AQB 
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V=== = 
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Fig. 5 — dpcm-aq schemes, b + gk = buffer and gain estimator, gc = gain control. 

3.1 Optimum results: three bits/sample quantization 

Figure 7 shows the optimum results reached with a three-bit quan- 
tization of the 2.3-s speech sample. 

Left curves : Optimum results using a fixed quantizer. 
Right curves : Optimum results using an adaptive quantizer. 
Lower curves : Prediction with a first-order predictor (one 

coefficient). 
Upper curves : Prediction with a high-order predictor. 



r= ADPCM-AQF 
ADPCM-AQB 




Fig. 6 — adpcm-aq schemes, b + gck = buffer and gain and coefficients estimator. 
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Fig. 7— Signal-to-noise ratio values and gains (over logarithmic pcm) for different 
three-bit speech-encoding systems. 

Quantizers with a logarithmic characteristic have been used in all 
simulations with a fixed quantizer (curves on the left side of Fig. 7). 
The s/n value for a pcm scheme is 

s/n = 8.7 dB 

if the quantizer has a mIOO characteristic, and if the loading is 4<r« (<r x 
is the standard deviation of the signal to be quantized). As compared 
to this s/n value of 8.7 dB, the following maximum gains can be 
reached with prediction schemes using the same type of quantizer 
(G* is the gain in s/n over pcm) : 

Fixed predictor, fixed quantizer : G* ^ 7 dB 
Adaptive predictor, fixed quantizer: G* « 11 dB. 

Adaptive quantization (pcm-aq) not only has the advantage of in- 
creasing the dynamic range that the quantizer can handle, but it 
also allows the application of quantizers that are optimum for the 
probability density function of the signal to be quantized. The follow- 
ing gain over the 8.7-dB value of nonadaptive pcm has been reached : 



Adaptive quantization : G* £ 
Using predictors, the gains over pcm are now 



7dB. 



Fixed predictor, adaptive quantizer: G* « 12 dB 
Adaptive predictor, adaptive quantizer : G* £b 16 dB. 
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Fig. 8 — Comparison of waveforms. x(n) = sequence of input samples (512 sam- 
ples), y{n) = sequence of decoded samples, q(n) = sequence of quantization errors, 
Gn = sequence of amplifier gains. 

Figure 8 shows the waveforms of the reconstructed signal and of the 
quantization error for a 64-ms segment of speech. Three examples are 
shown : 

(i) pcm, nonadaptive, /zlOO characteristic. Only eight different 
levels can be used for the reconstruction (decoding) of the 
signal. 
(ii) pcm-aqf, optimum Gauss quantizer, NSEG = 32. The number 
of levels is limited to eight for each segment of NSEG samples. 
Different levels can be used for each segment. 
(Hi) ADPCM12-AQF, optimum Laplace quantizer, NSEG = 128. The 
predictive encoding with a 12th-order predictor leads to a very 
high s/n. For each segment, the number of levels of the dif- 
ference signal is limited to eight, but the reconstructed signal 
does not suffer this limitation. 

3.2 Adaptive delta modulation 

To determine whether the quantization schemes represent an im- 
provement over existing adaptive delta modulation (adm) schemes, the 
s/n value of Jayant's Ami-scheme 12 has been determined at a bit rate 
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of 24 kb/s. The s/n value is approximately 15 dB. 7 Therefore, the gain 
over nonadaptive three bits/sample pcm is 

Adaptive delta modulation : G* ^ 6 dB. 

3.3 Entropy coding 

Entropy coding is a variable-length coding procedure that assigns 
short code words to highly probable symbols and longer code words to 
less probable symbols. The average word length is approximately equal 
to the entropy of the quantizer output signal. The entropy coding 
technique leads to an additional gain in s/n for a given average bit 
rate. The number of quantizer steps can be increased without exceed- 
ing an average bit rate of three bits per sample. The dashed lines in 
Fig. 7 show the s/n values that can be reached by using an entropy 
coding technique. In this case, uniform quantizers with a large number 
of steps have been employed; the step sizes have been adjusted to 
give a quantizer output entropy of three bits. It should be noted that 
a buffer is needed so that the variable-length coded signal can be 
transmitted over a channel at a uniform bit rate. 

3.4 Optimum results: two bits /sample up to five bits /sample quantization 

Figure 9 shows the s/n values for quantizations with two bits/sample 
up to five bits/sample (corresponding to bit rates from 16 kb/s up to 
40 kb/s). The following encoding schemes have been compared: 

pcm /i 100 characteristic, 8<t x loading. 

pcm-aqf NSEG = 32, optimum Gauss quantizer. 

dpcmI-aqb 1 predictor-coefficient, fixed; optimum Gauss 

quantizer. 
adpcmi-aqf 1 predictor-coefficient, adaptive; optimum Gauss 

quantizer; NSEG = 32. 
adpcm4-aqf 4 predictor-coefficients, adaptive; optimum Laplace 

quantizer; NSEG = 128. 
adpcmi-aqf 12 predictor-coefficients, adaptive; optimum gamma 

quantizer; NSEG = 256. 

3.5 Parameter transmission in adaptive encoding schemes 

In all forward schemes, channel capacity is needed for transmission 
of the adaptive parameters. The problems and techniques of quantizing 
these parameters are not considered in this paper. It is known that the 
parameters tolerate coarse quantization and slow updating. If neces- 
sary, redundancy-reducing schemes can be used to lower the number of 
bits that have to be transmitted in addition to the encoded speech 
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Fig. 9 — Signal-to-noise ratio values for quantization with two bits per sample 
(16 kb/s) up to five bits per sample (40 kb/s). 



samples. The needed channel capacity can be approximately trans- 
formed into an equivalent loss in s/n. If each parameter of the adaptive 
scheme has to be encoded with NADD bits/segment, and if NSEG 
is the number of samples/segment, then we get an equivalent reduction 
in s/n performance : 



A 8 n = 6.02 NADD (bits/segment) , 
NSEG (samples/segment) 



(9) 



This loss is due to the reduction of the number of quantizer steps in 
order not to exceed the maximum allowed bit rate. 

Example : 

NSEG = 128 (16 ms) 
NADD = 4 bit. 

The loss is 0.2 dB for each coefficient to be transmitted. 
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IV. UPPER BOUNDS FOR PREDICTION 

The linear dependencies between the amplitudes of the speech 
sample being used in all simulations have been calculated to get a 
measure of the maximum gain that can be reached with linear predic- 
tion. Note that these upper bounds of the prediction gain cannot be 
reached with predictive encoding systems (especially if the quantizer 
has only a low number of quantization levels), because prediction is 
done then with decoded speech samples. These samples include a 
quantization error. 

4.1 Nonadaptive prediction 

The long-term autocorrelation function of the speech signal has been 
measured. Figure 10 shows the first 19 time lags of the normalized 
autocorrelation function p(w). Using these data, a predictor can be 
optimized such that the variance of the difference signal is minimum. 

The prediction gain is the ratio of the variances of the input signal 
and the difference signal : 

*-» i -i»85Si$- 10k *"3- (l0) 

G P can be calculated directly from the normalized autocorrelation 
function p(n). Figure 11 shows this gain versus the number of coeffi- 
cients being used for the prediction. The maximum prediction gain is 
approximately 10.5 dB. This value is an upper bound of the additional 
gain in s/n over pcm by using nonadaptive differential encoding 
schemes. This gain cannot be reached if the dpcm encoder has to handle 
speech samples of different speakers. In this case, suboptimum pre- 
dictor coefficients have to be chosen such that the dpcm encoder has 




10 12 

TIME LAG n 



Fig. 10 — Normalized autocorrelation function (female voice; 200 to 3200 Hz). 
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Fig. 11 — Prediction gains vs number of predictor coefficients. 

a good performance for all speakers. This demand can only be fulfilled 
with predictors of low order (up to three coefficients). It may be 
relevant to mention that such suboptimum predictor coefficients have 
been used in the simulation of the dpcm schemes. 

Knowing the long-term autocorrelation function p(ri), it is possible 
to calculate an approximation of the power density function. This is 
done by calculating the power transfer function of a recursive filter, 
the coefficients of which are equivalent to the coefficients of the opti- 
mum predictor (maximum-entropy method 13 ). Figure 12 shows the 
power density spectrum calculated in this way from 16 coefficients of 
the autocorrelation function p(n). 

4.2 Adaptive prediction 

NSEO samples of the input samples are buffered, and the short-term 
autocorrelation function of this segment is calculated. For each seg- 
ment of NSEG samples, the variance of the difference signal can be 
calculated directly from this short-term autocorrelation function [see 
eq. (4)]. Using these variances, a prediction gain can be determined for 
different numbers of predictor coefficients and for different values of 
NSEG. Figure 11 shows the optimum prediction gain for an adaptive 
prediction scheme versus the number of predictor coefficients. In each 
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Fig. 12 — Power density of the speech signal. 

case, the optimum value NSEG has been used. The maximum predic- 
tion gain is approximately 14 dB. This value is an upper bound of the 
additional gain in s/n over pcm by using adaptive differential encoding 
schemes. 

V. UPPER BOUNDS FOR QUANTIZATION 

It is possible to design quantizers such that the signal-to-quantizing- 
noise ratio is a maximum ; this is done by choosing the quantizer step 
sizes according to the probability density function of the signal. It is 
known that these optimum quantizers cannot be used for the quantiza- 
tion of speech signals : the s/n improvement is offset by the greater idle 
channel noise and smaller dynamic range (Ref. 4; see also Section 
2.1 above). Optimum quantization is practical, however, if used in 
an adaptive quantization scheme ; it gives us an s/n advantage over 
logarithmic quantization, and it allows a further increase in s/n by 
using entropy coding techniques (variable length coding). The adaptive 
quantization technique changes the pdf of the signal to be quantized ; 
it has been shown 3 that different density functions can be reached with 
the forward estimation scheme (aqf scheme). Table II shows the s/n 
values for three-bit quantizers without and with entropy coding. The 
values of the first two columns are taken from Max 14 and Paez and 
Glisson. 15 In the case of entropy coding, the quantizers have been 
optimized so that the s/n is maximum for the given average bit rate 
of three bits per sample. 16 It is not possible to get higher s/n values with 
any encoding scheme based on memoryless single-letter quantization. 
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Table II — Maximum s/n values of various three-bit quantizers 




Quantizer Without Entropy Coding 






Optimum 
Uniform 
Quantizer 


Optimum 

Nonuniform 

Quantizer 


Quantizer With 
Entropy Coding 




s/n(dB) 


s/n(dB) 


s/n(dB) 


Uniform pdf 
Gaussian pdf 
Laplace pdf 
Gamma pdf 


18.06 
14.27 
11.44 

8.78 


(18.06) 
14.62 
12.61 
11.47 


18.06 
16.53 
17.09 
18.78 



VI. CONCLUSIONS 

Comparisons of various nonadaptive and adaptive three-bit speech- 
encoding systems via simulation with speech inputs show that a wide 
range of signal-to-quantization-noise ratios can be reached starting 
with 9 dB (logarithmic pcm) and increasing up to 27 dB (adaptive 
predictive coding with adaptive quantization and entropy coding). 
Adaptive quantization has an s/n advantage of 7 or 5 db over logarith- 
mic pcm when used in encoding schemes without and with prediction, 
respectively. Nonadaptive prediction leads to a 7-dB increase in s/n, 
and 11 dB can be gained using adaptive prediction techniques. Entropy 
coding gives an additional 2 to 3 dB improvement; such a coding 
technique is difficult to implement if a constant bit rate has to be 
achieved, but it may be of interest for asynchronous data networks. 
Furthermore, subjectively, dpcm gains over logarithmic pcm are 
believed to be greater than what the s/n gains suggest. 1 

Informal listening tests have shown that all predictive encoding 
schemes give a very good speech quality when used in connection with 
adaptive quantization (dpcm-aq or adpcm-aq). Differences between 
the original speech and the decoded speech are not audible with adap- 
tive prediction schemes when a high-order predictor is used (for 
example, adpcm4-aqf). 

The upper bounds that have been determined separately for the 
prediction gains and the quantizer s/n performances cannot be reached 
in practical predictive encoding systems. This fact is attributed to the 
predictor-quantizer interaction; that is, the input to the predictor is 
a noisy version of the input signal, and the input to the quantizer is a 
noisy prediction error. This interaction is not negligible when three-bit 
quantizers are used. 

It is important to realize that all results are based on a single speech 
record of one speaker. Computer simulations using other speech 
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material show basically similar results ; the main differences appear in 
the exact prediction gains that can be reached. In many instances, 
these gains are higher than those mentioned in this paper. 

One object of this paper was to quantify the (relative and absolute) 
capabilities of a wide range of nonpitch-tracking speech coders. The 
coders studied have a variety of potential applications that call for 
different specifications of speech quality and coder complexity. A 
second purpose of this paper was to study the capabilities of three-bit 
encoding in some detail, as motivated by mobile telephone studies. 17 - 18 
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